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Abstract 

Dynamical processes taking place on networks have received much at- 
tention in recent years, especially on various models of random graphs 
(including small world and scale free networks). They model a variety 
of phenomena, including the spread of information on the Internet; the 
outbreak of epidemics in a spatially structured population; and commu- 
nication between randomly dispersed processors in an ad hoc wireless 
network. Typically, research has concentrated on the existence and size 
of a large connected component (representing, say, the size of the epi- 
demic) in a percolation model, or uses differential equations to study the 
dynamics using a mean-field approximation in an infinite graph. Here 
we investigate the time taken for information to propagate from a single 
source through a finite network, as a function of the number of nodes and 
the network topology. We assume that time is discrete, and that nodes at- 
tempt to transmit to their neighbours in parallel, with a given probability 
of success. We solve this problem exactly for several specific topologies, 
and use a large-deviation theorem to derive general asymptotic bounds, 
which apply to any family of networks where the diameter grows at least 
logarithmically in the number of nodes. We use these bounds, for exam- 
ple, to show that a scale-free network has propagation time logarithmic 
in the number of nodes, and inversely proportional to the transmission 
probability. 
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Figure 1: Nodes that have the information (black) try, in parallel, to copy to 
their neighbours. Transmissions succeed with probability p. 

1 Introduction 

Within a few years we will be able to produce vast numbers of microscopic, 
extremely cheap, computer processors jT|. These could be randomly distributed 
(or painted) on a surface and, by making use of their massive parallelism, form 
an intelligent, computational lawn. However, each processor will only be able 
to communicate over a short range and with limited reliability. An obvious 
question is: how long would a message take to spread across the network starting 
from a single source? Similar questions arise in epidemiology ^2 HHSHE! 
I17| . Given a spatially distributed population, in which individuals infect their 
neighbours with a certain probability, how long before the whole population is 
infected? 

There are several different questions that can be posed about propagation 
in networks. If there is only one chance for a node to successfuly transmit to 
its neighbour, one can ask under what conditions (and with what probability) 
most or all of the network is "infected" in the steady state. This requires the 
investigation of the percolation structure of the system El HI IHI H51 HJ. 
Alternatively, one can focus on the dynamics. A typical approach here is to 
assume an infinite network with sufficient symmetry to make use use of a mean- 
field approximation in continuous time [111 IT3 ) . In this paper, we wish to 
investigate propagation through a finite network of arbitrary topology (including 
so-called scale-free networks such as the Internet ^5] |3J), assuming that 
transmissions take place in parallel, in discrete time steps, and with some fixed 
probability of success. Transmissions are attempted at every time step until 
successful. 

Suppose we have a network of computers and some information is located on 
one of them. At each time step, processors with the information try to copy it to 
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Figure 2: A simple chain of n nodes. 



their neighbours, with success probability p (see figure We wish to estimate 
the expected time for the information to spread to all nodes, which we call the 
propagation time, E[n], on a network containing n nodes. Similarly, one could 
consider the spread of an infectious disease where p is the infection rate, or 
the spread of a mutant gene in a metapopulation |12l 110) . The simplest network 
to consider is a chain of n nodes (see figure If the information starts at one 
end, then the expected time, E[n] , for it to have crossed the network is (n— 1) /p. 
This result can be derived from a recurrence relation for the propagation time 
given k remaining nodes: 



A similar recurrence enables us to solve the case of a ring of n nodes. The exact 
result is complicated but to a good approximation is (n — l)/2p. 

2 The general recurrence equation 

Such a recurrence can be derived for any network: the propagation time starting 
from a particular situation can be broken down into the possible cases occurring 
after a single time step, with their associated probabilities. 

We consider the general problem of a random sequence X\, X2, ■ ■ ■ of states 
from some set A, and a subset D C A of desired states. We can derive a 
recurrence relation for the first hitting time of the desired subset in terms of 
all the situations that could possibly arise after a single time step, and their 
associated probabilities. The first hitting time is defined as T — min{i | X t € 
D}. In our case, the random sequence comes from the different states of the 
network as the information (or infection) is propagated from node to node. The 
desired state is when all the nodes have been infected. 

Suppose after the first time step the network is in state Xi = k. We consider 
the conditional probability space that arises from this situation. We write E k 
to denote expectation with respect to this conditional probability space, and 
consider the shifted stochastic process 



Let Tfc = min{t | Y t € D} be the first hitting time after this first step has been 
made. We then have: 



E[k] = 1 + pE[k -!] + (!- p)E[k] 



(1) 



Yi — X2, 12 — X3, . . . , Y n — X n+ i, . . . 



E[T] = Pr[Ai = k] ■ E k [T k ] + 1 
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Proof 

OO 

E[T] = £Pr[T = j]-j 

= E ( E Pr t Xi = fc ] • Pr t T = ? i Xi = fc ] ) • m -v + v 

3=1 \k£A ) 

(oo 
J2^[T = j\X 1 = k}-(j-l) + l 
3 = 1 

= Pr[X 1 = k] lf2*>r[T = j\X 1 = k}-(j-l)\ + J2Pr[X 1= k] 

keA \j=i J keA 

= Pr[Xi = k]-E k [T k ] + l 

keA 

□ 

3 The hub model 

A more complex example is the hub in which the information starts at a central 
node which repeatedly tries to transmit to n neighbours. For example, consider 
a transmitter signalling to a number of receivers, or a collection of n people 
that independently have probability p of contracting a disease. The recurrence 
relation becomes: 

where q = 1 — p. We rearrange to give the recurrence relation: 

n-l , 

{l-q n )E[n] = l + Y, [ h )p n - k q k E[k] 



k=0 



with E[0] = 0. The transmission probability is p and q = 1 — p. We claim that 
£7[n]=0(Iog(n+l)). 

We first prove the general result that, if k is distributed according to the 
Binomial distribution B(n, q), then the expected value of log(fc + l) is 6(log(n + 

!))• 

Proof To prove the upper bound, first note that, for < q < 1, and for all 
n > 1 

m, + i< (n + 1) 2 (1 + q) 

Then, by concave property of logs, 

x:("y-viog(fc+i) 



fe=0 
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= log(nq + 1) 



2 

log(n + 1) - log 



,1 + 9, 

Now note that the lower bound holds, if and only if: 



k=0 

which we prove as follows: 



i:(;>"-v.o 6 (^) £ i/ 9 



k=0 

< 



t(:y-v,o g ( 1+ ^ 

l— n V 7 



fc+ 1 



k=o x 7 v 7 fc=0 v 



fc=0 
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fe=0 

n v-^ (n + 1 



n+1 

E 



n+ 1 f-^ \ k 

k=l 

n+1 



p n-k+l q k-l 



1 



« n+ -k=o 



< 

q 

(where we used the fact that log(l + x) < x for all x > 0). □ 
We now use this result to show that the propagation time for a hub with n 
clients is E[n] = 0(log(n + 1)). In particular, 

glog(n + 1) < E[n] < ( - — ^- I log(n + 1) 
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Proof We prove, by induction, that E[n] < ^41og(n + 1), where 

1 



The case n = is easy, since E[0] = = Alogl. Now suppose n > 1 and 
that the hypothesis is true for all < k < n. Then 



{l-q n )E[n] 

k=0 



/ \ 

1 +E [iy- k Q k A\og(k + l) 
= l-Aq n log(n + l)+Aj2 (^jp n - k q k log(k + l) 
< 1 - Aq n log(n + 1) + Alog(n + 1) - Alog 



+ q 



A(l-q n )\og(n+l) 



and the result follows. 

Secondly, we show by induction that E[n] > qlog(n + 1). The case n = is 
again easy. Now suppose the hypothesis is true for all < k < n. Then 

(1 - q n )E[n] 
n-1 

> l + g^p n -Vlog(A + l) 

k=0 

n 

= 1 - q n+1 log(n + 1) + q^P^^ lo §( fc + X ) 

fe=0 

> 1 -q n+1 log{n +l)+q (log(n +1)-^- 
= q(l -q n ) log(n + 1) 

□ 



4 Epidemiology and perfect mixing 

In epidemiology models of the spread of infectious diseases, it is common to 
assume perfect mixing: that every individual interacts with every other individ- 
ual. This corresponds to having a complete graph, with every node connected 
to every other node. The propagation time for a complete graph is bounded 
by a constant: it does not depend on the number of nodes. Moreover, as n 
gets large, the propagation time is just two time steps (with increasingly high 
probability). Suppose we have a complete graph on n + 1 nodes and initially 
one node is infected. The probability that the infection passes to a neighbour is 
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p and we let q = 1 — p. After a single time step it is very likely that close to np 
new nodes have been infected. In fact, Chernoff's inequality ^2] tells us that 
the probability that less than (1 — 5)np nodes have been infected is less than 
exp(— npS 2 /2), where we can make S as small as we like. This means that, the 
more nodes there are, the surer we can be that nearly np nodes are infected after 
one time step. The probability that the remaining nodes get infected on the next 
time step is therefore close to (1 — q n py L i. Using the fact that (1 — x) n > l — nx 
for all < x < 1, we see that (1 - q n ' } ) n i > 1 - nq n P +1 > 1 - r" (for some r in 
the range < q np < r < 1), which is 1 - e~°( n) . 

Similarly for a complete bipartite graph, with n nodes in each set, the proba- 
bility that a single node in one set infects np nodes in the second gets arbitrarily 
close to 1 as n gets large. This is then enough to infect all the other nodes in 
the first set (again with arbitrarily high probability). Then, on the third time 
step, the remaining nodes of the second step get infected. This analysis can be 
extended to complete multipartite graphs in an obvious way. 

In our model, nodes are either infected or not. This corresponds to the SI 
model of epidemiology (Susceptible-Infected P]). If I is the number of infected 
people and S = n = I are the remaining susceptibles, we would like to know how 
many more people become infected in one time step. For the complete graph 
(perfect mixing) the number of newly infected people is binomially distributed 
between and S with success probability 1 — (1 — p) 1 . If p is small, this is 
approximately equal to pi and so the expected increase in infected people is 
close to AI w pSI, which agrees with the standard SI model. To extend our 
model to more realistic scenarios, one would have to introduce a third state R 
(removed) for those people who can no longer be infected (due to immunity or 
death). 

5 General upper and lower bounds 

For the general CM SO , ci lower bound on the propagation time, starting from a 
particular node, is given by the eccentricity of that node (divided by p). That 
is, the distance from the source to the most distant node in the network. This 
is because at least that number of successful transmissions will have to be made 
for the whole network to be infected. It is also possible to derive a general upper 
bound on the propagation time for an arbitrary network. The idea is to replace 
the network with a minimum spanning tree, rooted at the starting node. The 
propagation time on the tree is slower than for the original network, since we 
may have lost a number of "short-cuts" . We then replace the tree with a star 
graph, with a hub at the starting node, and b branches: one for every leaf of the 
tree. The length of each branch is the eccentricity, d, of the hub (see figure OJ) ■ 
Using Chernoff bounds, we can prove that the propagation time for a star graph 
is 0({d + \ogb)/p). 

To show this, we return to the general problem of estimating the first hitting 
time of a desired subset of states in a random sequence. Let A be a set of states 
and let D C A be the desired subset. Let Y(l), Y(2), ... be a random sequence 
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Figure 3: To find an upper bound for the propagetion time of a network, we 
first find a minimum spanning tree. From this, we create a star graph, which 
we then balance. 

of states from A that satisfies the following monotonicity properties: 

f . If Y(t) G D, then for all k > 0, Y(t + k) G D. 

2. Pr[Y(t + k) G D\Y(t) $D]> Pr[Y(k) G £>] for all t,k>0. 

In other words, the probability of reaching the desired state after a given time 
interval always improves, and once it is reached, it is never left. In the case of 
the star graph, the random variables Y(t) will represent the minimum number 
of infected nodes along each branch at time t. The desired state is that all nodes 
on all branches are infected. Clearly, the probability of achieving this state in 
a fixed amount of time can only improve as time goes by, so the monotonicity 
condition is satisfied. 

Define the first hitting time of the sequence to be T(D) — mm{t\Y(t) G D}. 
Suppose we can find a time r and constant < e < 1 such that 

Pr[F(r) G D] > 1 - e 

Then we claim that 

E[T] < ^ 

That is, if after some time r (which will in general depend on the structure 
of the problem), we have some lower bound on the success probability at that 
time, then we can use this fact to estimate the first hitting time for the whole 
process. 

Proof According to the definition of expectation we have 
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oo / t \/2t \/3t \ 

^fcPr[T = fc] = ^fcPr[T = fc] + kPr[T = k]\ + I ^ kPr[T = k}\ + 

fc=l \fe=l / \fe=r+l / \fc=2r+l / 

< | tJ2 Pr[T = fc] ) + J 2r ^ Pr[T = fc] ) + | 3r ^ Pr[T = fc] 

V fe=l / V k=r + l / \ fc=2r+l / 

(oo \ / oo \ / oo \ 

£Pr[T = fc] +r £ Pr[T = fc] +r £ Pr[T = fc] - 

fe=l / U = T + 1 / \fe = 2T + l / 

= r (1 + Pr[F(r) ^ D] + Pr[y (2r) g £>] + ■■ •) 

Now the second monotonicity condition can be equivalently stated as: 

Pr[F(i + fc) £ £>|Y(i) ^ -D] < Pr[Y(fc) £ D] 

for all t, k > 0. Using the definition of conditional probability, this gives us: 

Pr[Y(t + k)£D]< Vr[Y{t) <£ D] Pr[Y(k) <£ D] 

We already know that Pr[F(r) ^ D] < e. And by induction on m we get 
Pr [F(mr) ^ D] < e m . Therefore 



m=0 

as required. □ 
Now consider propagation in a star graph with b branches of depth d. The 
problem is equivalent to b parallel repeating Bernoulli trials Xi(t), X 2 {t) 1 . .. ,Xb(t), 
each with success probability p. Xj (t) is the number of infected nodes on branch 
j at time t. We want the expected time until all of them have achieved at least 
d successes. So let Y(t) = mini<i<b Xi(t), which certainly satisfies the mono- 
tonicity requirement. Then 

Pr[K(t) < (1 - S)tp] 
= Pr[(Xi(t) < (1 - 5)tp) V • • • V (X b (t) < (1 - S)tp)} 

b 

< ^Pr[X i (t)<(l-J)tp] 
1=1 

<5 2 * 



< 6exp(-tp- 2 



where we have applied Chernoff 's inequality. Now we choose time r = | (d + 



8, 

logfe), and <5 = 1/2. Notice that r > 2d/p and so d < rp/2 . Therefore 
Pr[F(r) < d] < Pr[F(r) < \rp] 
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< 6cxp(-^) 

= 6exp (— d — log 6) 
= exp(— d) 

< exp(-l) 

So the probability that we have achieved the desired state by time r is at least 
1 — e _1 . Applying the lemma, we conclude that the expected time to completion 
is less than 

8(d + log6) 

□ 

The upper bound for the star is also an upper bound for the original network. 
Since the eccentricity of any node in a network is less than the diameter D of 
the network (the length of the greatest distance between nodes of the network), 
and the number of leaf nodes b is less than n, we have a general upper bound 
on the propagation time for networks of 0((D + logn)/p). We interpret the 
diameter D as the time associated with the depth of the network, and the factor 
logn with the breadth. The bound is the maximum of these two factors. 

6 Results for various networks 

We can use these bounds to derive asymptotic results for a range of network 
topologies. A random graph on n nodes is created by assigning an edge between 
nodes with a given probability. The diameter of such graphs grows as logn. 
Applying our bound then gives a propagation time of 0((logn)/p). 

For scale-free networks with degree distribution p(k) cx k~ x there are three 
cases Q. For A > 3, the diameter grows as logn and so again the propagation 
time is 6((logn)/p). For 2 < A < 3, the diameter grows much more slowly, as 
log logn. In this case the propagation time is between log logn and logn. The 
third case is A — 3 for which the diameter grows as logn/ log logn, which again 
acts as a lower bound on the propagation time. 

Hierarchies in organisational structures may be modelled by tree networks. 
A complete binary tree has depth logn, and so the propagation time, starting 
at the root, node is O(logn). A lattice structure is commonly used in artificial 
life models (such as cellular automata). Each individual is connected to the 
neighbours to the north, east, south and west. The diameter of such a network 
(and therefore the propagation time) is 0(y / n). This is considerably slower than 
for random and small- world networks. It is known that small- world networks 
can be constructed from lattices by introducing a small number of random 
"short-cuts" between nodes |T2]. We see that by doing this, we dramatically 
reduce the propagation time. 

If the diameter grows logarithmically in the number of nodes or faster, then 
it determines the propagation time (that is, it dominates the "breadth" factor 
given by the number of branches in the spanning tree). In this case, the prop- 
agation time is also inversely proportional to p. However, if it grows slower 
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Figure 4: Nodes are distributed randomly in the unit square and are connected 
if they are within a distance of r from each other. We tile the square so that 
the nodes in each tile form a complete graph. 

than logarithmically, we do not get so much information from our bounds. For 
example, both the complete graph and the hub have constant diameters: our 
bounds cannot distinguish these cases. 

This situation occurs in our final example, in which nodes are spatially 
distributed. Imagine a square with unit length sides. Nodes are distributed 
randomly in the square and an edge is drawn between nodes that are less than 
a distance r apart. For example, this could model a random distribution of 
processors in a computational lawn that have a limited transmission range pQ. 
The furthest apart two nodes can be geometrically is y/2, so the diameter of the 
network, as n increases, approaches y/2/r. A lower bound on the propagation 
time is therefore c/r, for some constant c, which does not depend on n. We 
will show that the propagation time is also bounded above by a constant and 
is inversely proportional to r. To do this, we divide the square up into disjoint 
tiles with side length (see figure 0}. The diagonal of each tile is r, so all 

the nodes in a tile are connected to all the others. The nodes of one tile in 
isolation form a complete graph, for which the propagation time is constant. 
Now consider two neighbouring tiles, with a common edge. Place a third tile 
so that it covers half of each of these. The nodes in the third tile again form a 
complete graph, with constant propagation time. This means that the expected 
time for propagation from one of the original tiles to its neighbour is a constant. 
The situation, therefore, reduces to the constant time spread of information from 
tile to tile. Since there are y/2/r tiles along each side of the square, the number 
of tiles that have to be traversed on a path between the corners is proportional 
to 1/r. Since each move takes place in constant time, the result follows. 
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